Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enable cast string to int tests and fix compatibility issue #453

Merged
merged 2 commits into from
May 21, 2024

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes #431

Rationale for this change

Enable cast string to int as a compatible expression.

Also, there is a performance improvement with these changes:

cast_string_to_int/cast_string_to_i16
                        time:   [17.445 µs 17.699 µs 18.019 µs]
                        change: [-41.405% -40.622% -39.805%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

  • Enable tests for cast string to int
  • Include original input value in error message instead of trimmed input value
  • Simplify code and remove whitespace handling code that was not needed since we already trim the input values

How are these changes tested?

Enabled the tests that were previously disabled

@@ -82,7 +82,7 @@ macro_rules! cast_utf8_to_int {
for i in 0..len {
if $array.is_null(i) {
cast_array.append_null()
} else if let Some(cast_value) = $cast_method($array.value(i).trim(), $eval_mode)? {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally trimmed the input strings here, so error messages did not have access to the original inputs


if state == State::ParseSignAndDigits {
if !parsed_sign {
for (i, ch) in trimmed_str.char_indices() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We process the trimmed string here and we no longer need the logic for skipping leading and trailing whitespace

@@ -1070,7 +1050,7 @@ fn do_cast_string_to_int<
if ch == '.' {
if eval_mode == EvalMode::Legacy {
// truncate decimal in legacy mode
state = State::ParseFractionalDigits;
parse_sign_and_digits = false;
continue;
} else {
return none_or_err(eval_mode, type_name, str);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error messages refer to the original input string, not the trimmed version

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

@parthchandra
Copy link
Contributor

So do you thing the perf improvement is because we are no longer trimming?

@andygrove
Copy link
Member Author

So do you thing the perf improvement is because we are no longer trimming?

We are still trimming, but we are no longer performing the redundant conditional logic in the main loop to skip leading and trailing whitespace.

@andygrove andygrove merged commit de8fe45 into apache:main May 21, 2024
41 checks passed
@andygrove andygrove deleted the fix-cast-string-to-int branch May 21, 2024 02:02
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…che#453)

* simplify cast string to int logic and use untrimmed string in error messages

* remove state enum

(cherry picked from commit de8fe45)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: CAST string to integer does not handle all invalid inputs
4 participants